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Abstract — We  propose  a  new  method  for  reconstruction  of 
sparse  signals  with  and  without  noisy  perturbations,  termed  the 
subspace  pursuit  algorithm.  The  algorithm  has  two  important 
characteristics:  low  computational  complexity,  comparable  to 
that  of  orthogonal  matching  pursuit  techniques,  and  reconstruc¬ 
tion  accuracy  of  the  same  order  as  that  of  LP  optimization 
methods.  The  presented  analysis  shows  that  in  the  noiseless 
setting,  the  proposed  algorithm  can  exactly  reconstruct  arbitrary 
sparse  signals  provided  that  the  sensing  matrix  satisfies  the 
restricted  isometry  property  with  a  constant  parameter.  In  the 
noisy  setting  and  in  the  case  that  the  signal  is  not  exactly  sparse, 
it  can  be  shown  that  the  mean  squared  error  of  the  reconstruction 
is  upper  bounded  by  constant  multiples  of  the  measurement  and 
signal  perturbation  energies. 

Index  Terms — Compressive  sensing,  orthogonal  matching  pur¬ 
suit,  reconstruction  algorithms,  restricted  isometry  property, 
sparse  signal  reconstruction 

I.  Introduction 

Compressive  sensing  (CS)  is  a  method  closely  connected 
to  transform  coding,  a  compression  technique  widely  used 
in  modern  communication  systems  involving  large  scale  data 
samples.  A  transform  code  converts  input  signals,  embedded 
in  a  high  dimensional  space,  into  signals  that  lie  in  a  space  of 
significantly  smaller  dimension.  Examples  of  transform  coders 
include  the  well  known  wavelet  transforms  and  the  ubiquitous 
Fourier  transform. 

Compressive  sensing  techniques  perform  transform  cod¬ 
ing  successfully  whenever  applied  to  so-called  compressible 
and/or  K -sparse  signals,  i.e.,  signals  that  can  be  represented  by 
K  <C  N  significant  coefficients  over  an  IV- dimensional  basis. 
Encoding  of  a  AT-sparse,  discrete-time  signal  x  of  dimension 
N  is  accomplished  by  computing  a  measurement  vector  y 
that  consists  of  to  <C  N  linear  projections  of  the  vector  x, 
compactly  described  via 

y  =  3>x. 

Here,  represents  an  m  x  N  matrix,  usually  over  the  field 
of  real  numbers.  Within  this  framework,  the  projection  basis 
is  assumed  to  be  incoherent  with  the  basis  in  which  the  signal 
has  a  sparse  representation  [2], 
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Although  the  reconstruction  of  the  signal  x  £  from  the 
possibly  noisy  random  projections  is  an  ill-posed  problem,  the 
strong  prior  knowledge  of  signal  sparsity  allows  for  recovering 
x  using  to  <C  N  projections  only.  One  of  the  outstanding 
results  in  CS  theory  is  that  the  signal  x  can  be  reconstructed 
using  optimization  strategies  aimed  at  finding  the  sparsest 
signal  that  matches  with  the  to  projections.  In  other  words, 
the  reconstruction  problem  can  be  cast  as  an  Zq  minimization 
problem  [3],  It  can  be  shown  that  to  reconstruct  a  A'-sparse 
signal  x,  if  minimization  requires  only  to  =  K  +  1  random 
projections  when  the  signal  and  the  measurements  are  noise- 
free.  Unfortunately,  solving  the  if  optimization  is  known  to 
be  NP-hard.  This  issue  has  led  to  a  large  body  of  work  in  CS 
theory  and  practice  centered  around  the  design  of  measurement 
and  reconstruction  algorithms  with  tractable  reconstruction 
complexity. 

The  work  by  Donoho  and  Candes  et.  al.  [2],  [4]— [6] . 
demonstrated  that  CS  reconstruction  is,  indeed,  a  polynomial 
time  problem  -  albeit  under  the  constraint  that  more  than  A' +1 
measurements  are  used.  The  key  observation  behind  these 
findings  is  that  it  is  not  necessary  to  resort  to  if  optimization 
to  recover  x  from  the  under-determined  inverse  problem;  a 
much  easier  f  optimization,  based  on  Linear  Programming 
(LP)  techniques,  yields  an  equivalent  solution,  as  long  as  the 
sampling  matrix  <l>  satisfies  the  so  called  restricted  isometry 
property  (RIP)  with  a  constant  parameter. 

While  LP  techniques  play  an  important  role  in  designing 
computationally  tractable  CS  decoders,  their  complexity  is 
still  highly  impractical  for  many  applications.  In  such  cases, 
the  need  for  faster  decoding  algorithms  -  preferably  operating 
in  linear  time  -  is  of  critical  importance,  even  if  one  has 
to  increase  the  number  of  measurements.  Several  classes  of 
low-complexity  reconstruction  techniques  were  recently  put 
forward  as  alternatives  to  linear  programming  (LP)  based 
recovery,  which  include  group  testing  methods  [7],  and  al¬ 
gorithms  based  on  belief  propagation  [8], 

Recently,  a  family  of  iterative  greedy  algorithms  received 
significant  attention  due  to  their  low  complexity  and  simple 
geometric  interpretation.  They  include  the  Orthogonal  Match¬ 
ing  Pursuit  (OMP),  the  Regularized  OMP  (ROMP)  and  the 
Stagewise  OMP  (StOMP)  algorithms.  The  basic  idea  behind 
these  methods  is  to  find  the  support  of  the  unknown  signal 
sequentially.  At  each  iteration  of  the  algorithms,  one  or  several 
coordinates  of  the  vector  x  are  selected  for  testing  based 
on  the  correlation  values  between  the  columns  of  3>  and 
the  regularized  measurement  vector.  If  deemed  sufficiently 
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reliable,  the  candidates  are  subsequently  added  to  the  current 
estimate  of  the  support  set  of  x.  The  pursuit  algorithms  iterate 
this  procedure  until  all  the  coordinates  in  the  correct  support 
are  in  the  estimated  support.  The  computational  complexity  of 
OMP  strategies  depends  on  the  number  of  iterations  needed 
for  exact  reconstruction:  standard  OMP  always  runs  through 
K  iterations,  and  therefore  its  reconstruction  complexity  is 
roughly  O  (KmN).  This  complexity  is  significantly  smaller 
than  that  of  LP  methods,  especially  when  the  signal  sparsity 
level  K  is  small.  However,  the  pursuit  algorithms  do  not  have 
provable  reconstruction  quality  of  the  level  of  LP  methods.  For 
OMP  techniques  to  operate  successfully,  one  requires  that  the 
correlation  between  all  pairs  of  columns  of  3>  is  at  most  1/2K 
[9],  which  by  the  Gershgorin  Circle  Theorem  [10],  represents 
a  more  restrictive  constraint  than  the  RIP.  The  ROMP  algo¬ 
rithm  [11]  can  reconstruct  all  /C-sparse  signals  provided  that 
the  RIP  holds  with  parameter  5^k  <  0.06/-\/log  K,  which 
strengthens  the  RIP  requirements  for  l\ -linear  programming 
by  a  factor  of  <J\ogK. 

The  main  contribution  of  this  paper  is  a  new  algorithm, 
termed  the  subspace  pursuit  (SP)  algorithm,  which  exhibits 
low  reconstruction  complexity  of  matching  pursuit  techniques, 
but  has  provable  reconstruction  capability  comparable  to  that 
of  LP  methods.  The  algorithm  can  operate  both  in  the  noiseless 
and  noisy  regime,  allowing  for  exact  and  approximate  signal 
recovery,  respectively.  For  any  sampling  matrix  4>  satisfying 
the  RIP  with  a  constant  parameter  independent  on  I\,  the 
SP  algorithm  can  recover  arbitrary  /t-sparse  signals  exactly 
from  its  noiseless  measurements.  When  the  measurements 
are  inaccurate  and/or  the  signal  is  not  sufficiently  sparse, 
the  reconstruction  distortion  is  upper  bounded  by  a  constant 
multiple  of  the  measurement  and/or  signal  perturbation  en¬ 
ergy.  The  computational  complexity  of  the  SP  algorithm  is 
upper  bounded  by  O  (mNK),  but  can  be  further  reduced  to 
O  (mN  log  K)  when  the  nonzero  entries  of  the  sparse  signal 
decay  slowly. 

The  basic  idea  behind  the  SP  algorithm  is  borrowed  from 
sequential  coding  theory  with  backtracking,  more  precisely, 
the  A*  order-statistic  algorithm  [12],  In  this  decoding  frame¬ 
work,  one  first  selects  a  set  of  K  codewords  of  highest 
reliability  that  span  the  codespace.  If  the  distance  of  the 
received  vector  to  this  space  is  deemed  large,  the  algorithm 
incrementally  removes  and  adds  new  basis  vectors  according 
to  their  reliability  values,  until  a  sufficiently  close  candidate 
codeword  is  identified.  SP  employs  a  similar  strategy,  except 
for  the  fact  that  at  each  step,  the  same  number  of  vectors 
is  expurgated  from  the  candidate  list.  This  feature  is  mainly 
introduced  for  simplicity  of  analysis:  one  can  easily  extend 
the  algorithm  to  include  adaptive  expurgation  strategies  that 
do  not  necessarily  work  with  fixed-sized  lists. 

In  compressive  sensing,  the  major  challenge  associated  with 
sparse  signal  reconstruction  is  to  identify  in  which  subspace, 
generated  by  not  more  than  K  columns  of  the  matrix  <1>. 
the  measured  signal  y  lies  in.  Once  the  correct  subspace  is 
determined,  the  non-zero  signal  coefficients  are  calculated  by 
applying  the  pseudoinversion  process.  The  defining  character 
of  the  SP  algorithm  is  the  method  used  for  finding  the  K 
columns  that  span  the  correct  subspace:  SP  tests  subsets  of 


K  columns  in  a  group,  for  the  purpose  of  refining  at  each 
stage  an  initially  chosen  estimate  for  the  subspace.  More 
specifically,  the  algorithm  maintains  a  list  of  I\  columns  of  <1>, 
performs  a  simple  test  in  the  spanned  space,  and  then  refines 
the  list.  If  y  does  not  lie  in  the  current  estimate  for  the  correct 
spanning  space,  one  refines  the  estimate  by  retaining  reliable 
candidates,  discarding  the  unreliable  ones  while  adding  the 
same  number  of  new  candidates.  The  “reliability  property”  is 
captured  in  terms  of  the  order  statistics  of  the  inner  products 
of  the  received  signal  with  the  columns  of  <I>,  and  the  subspace 
projection  coefficients. 

As  a  consequence,  the  main  difference  between  ROMP  and 
the  SP  reconstruction  strategy  is  that  the  former  algorithm 
generates  a  list  of  candidates  sequentially,  without  back- 
tracing:  it  starts  with  an  empty  list,  identifies  one  or  several 
reliable  candidates  during  each  iteration,  and  adds  them  to 
the  already  existing  list.  Once  a  coordinate  is  deemed  to  be 
reliable  and  is  added  to  the  list,  it  is  not  removed  from  it 
until  terminating  the  algorithm.  This  search  strategy  is  overly 
restrictive,  since  candidates  have  to  be  selected  with  extreme 
caution.  In  contrast,  the  SP  algorithm  incorporates  a  simple 
method  for  re-evaluating  the  reliability  of  all  candidates  at 
each  iteration  of  the  process. 

The  remainder  of  the  paper  is  organized  as  follows.  Sec¬ 
tion  II  introduces  relevant  concepts  and  terminology  for  de¬ 
scribing  the  proposed  CS  reconstruction  technique.  Section  III 
contains  the  algorithmic  description  of  the  SP  algorithm,  along 
with  a  simulation-based  study  of  its  performance  when  com¬ 
pared  to  OMP,  ROMP,  and  LP  methods.  Section  IV  contains 
the  main  result  of  the  paper  pertaining  to  the  noiseless  setting: 
a  formal  proof  for  the  guaranteed  reconstruction  performance 
and  the  reconstruction  complexity  of  the  SP  algorithm.  Sec¬ 
tion  V  contains  the  main  result  of  the  paper  pertaining  to  the 
noisy  setting.  Concluding  remarks  are  given  in  Section  VI, 
while  proofs  of  most  of  the  theorems  are  presented  in  the 
Appendix  of  the  paper. 

II.  Preliminaries 

A.  Compressive  Sensing  and  the  Restricted  Isometry  Property 

Let  supp(x)  denote  the  set  of  indices  of  the  non-zero 
coordinates  of  an  arbitrary  vector  x  =  (ati, . . .  ,Xn ),  and  let 
|supp(x)|  =  ||  •  ||  denote  the  support  size  of  x,  or  equivalently, 
its  norm  1 .  Assume  next  that  x  £  is  an  unknown  signal 
with  |supp(x)  |  <  K,  and  let  y  £  Rm  be  an  observation  of  x 
via  M  linear  measurements,  i.e., 

y  =  $x, 

where  3>  £  RmxJV  is  henceforth  referred  to  as  the  sampling 
matrix. 

We  are  concerned  with  the  problem  of  low-complexity 
recovery  of  the  unknown  signal  x  from  the  measurement  y. 
A  natural  formulation  of  the  recovery  problem  is  within  an  //, 
norm  minimization  framework  which  seeks  a  solution  to  the 
problem 

min  ||x||0  subject  to  y  =  <l>x. 

'We  interchangeably  use  both  notations  in  the  paper. 
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Unfortunately,  solving  the  above  £q  minimization  problem  is 
NP-hard  and  therefore  not  practical  [4],  [5], 

One  way  to  avoid  using  this  computationally  intractable  for¬ 
mulation  is  to  refer  to  an  i\  -regularized  optimization  settings, 
i.e., 

min  Hxllj  subject  to  y  =  4>x, 

where 

N 

llxlli  =  1^1 

1=1 

denotes  the  t\  norm  of  the  vector  x. 

The  main  advantage  of  the  I\  minimization  approach  is  that 
it  is  a  convex  optimization  problem  that  can  be  solved  effi¬ 
ciently  by  linear  programming  (LP)  techniques.  This  method 
is  therefore  frequently  referred  to  as  t\  -LP  reconstruction,  and 
its  reconstruction  complexity  equals  O  ( N 3)  [4],  [13], 

The  reconstruction  accuracy  of  the  l\  -  LP  method  is  de¬ 
scribed  in  terms  of  the  so  called  restricted  isometry  property 
(RIP),  formally  defined  below. 

Definition  1  (Truncation):  Let  4>  G  Rmx,v  and  let  I  C 
{ 1 ,  -  •  •  ,  N}.  The  matrix  4>/  consists  of  the  columns  of  4> 
with  indices  i  G  I.  The  space  spanned  by  the  columns  of  4>/ 
is  denoted  by  span(4>/). 

Definition  2  (RIP):  A  matrix  $  G  RmxW  js  sajc|  to  sa(jsfy 
the  Restricted  Isometry  Property  (RIP)  with  parameters  (K ,  6) 
for  K  <  m,  0  <  6  <  1,  if  for  all  index  sets  I  C  {1,  ■  ■  ■  ,  N} 
such  that  |/|  <  K  and  for  all  q  G  RlJl,  one  has 

(i -^llqlla^lMa  <(i  +  «)  IN'- 

We  define  6k  to  be  the  infimum  of  all  parameters  <5  for 
which  the  RIP  holds,  i.e. 

SK  :=  inf  |<5  :  (1  -  (5)  ||q||2  <  H^/qlla  <  (1  +  8)  ||q||2  , 

V  |/|  <K,  VqGM|J|}. 

Remark  1  (RIP  and  eigenvalues):  If  a  sampling  matrix 
4?  G  RmxAr  satisfies  the  RIP  with  parameters  ( K,6k ),  then 
for  all  I  C  {1,  •  •  •  ,  N}  such  that  |/|  <  K,  it  holds  that 

1  -  6K  <  Amin  (*?*/)  <  Amax  ($1$/)  <  1  +  6K, 

where  Am;n  and  Amax  denote  the  minimum  and  maximum 
eigenvalues  of  4>,  respectively. 

Remark  2  (Matrices  satisfying  the  RIP):  Most  known  ex¬ 
amples  of  matrices  satisfying  the  RIP  property  with  optimal 
or  near-optimal  performance  guarantees  are  random.  Examples 
include: 

1)  Random  matrices  with  i.i.d.  entries  that  follow  either 
the  Gaussian  distribution,  Bernoulli  distribution  with 
zero  mean  and  variance  1/n,  or  any  other  distribution 
that  satisfies  certain  tail  decay  laws.  It  was  shown  in 
[13]  that  the  RIP  for  a  randomly  chosen  matrix  from 
such  ensembles  holds  with  overwhelming  probability 
whenever 

K<C  m 
~  log  (N / to)  ’ 

where  C  is  a  function  of  the  RIP  parameter. 


2)  Random  matrices  from  the  Fourier  ensemble.  Here,  one 
randomly  selects  m  rows  from  the  N  x  N  discrete 
Fourier  transform  matrix  uniformly  at  random.  Upon 
selection,  the  columns  of  the  matrix  are  scaled  to  unit 
norm.  The  resulting  matrix  satisfies  the  RIP  with  over¬ 
whelming  probability  provided  that 


K  <  C 


m 

(log  TV)6’ 


where  C  depends  only  on  the  RIP  parameter. 

There  exists  an  intimate  relationship  between  the  LP  recon¬ 
struction  accuracy  and  the  RIP  property,  first  described  by 
Candes  and  Tao  in  [4],  The  result  in  [4]  shows  that  if  the 
sampling  matrix  4>  satisfies  the  RIP  with  parameters  6k,  621c, 
and  63K,  such  that 


6k  +  62K  +  63K  <  1,  (1) 

then  the  fi-LP  algorithm  will  reconstruct  all  A'-sparse  signals 
exactly. 

For  our  subsequent  derivations,  we  need  two  results  summa¬ 
rized  in  the  lemma  below.  The  first  part  of  the  claim,  as  well 
as  a  related  modification  of  the  second  claim  also  appeared 
in  [4],  [11],  For  completeness,  we  include  the  proof  of  the 
lemma  in  Appendix  A. 

Lemma  1  (Consequences  of  RIP): 

1)  (Monotonicity  of  6k)  For  any  two  integers  K  <  K' , 

6k  <  6K'- 

2)  (Near  orthogonality  of  columns)  Let  I,  J  C  {1,  •  •  •  ,  N} 
be  two  disjoint  sets,  If)J=(f>.  Suppose  that  <5|/|+|j|  < 
1.  For  arbitrary  vectors  a  G  and  b  G 

|(4>/a,  4>jb)|  <  <5|/|+|j|  ||a||2  ||b||2  , 

and 

||^^.7b||2  <  <5^1+!./!  I|b||2  . 

The  lemma  implies  that  6k  <  62K  <  63K,  which  con¬ 
sequently  implies  that  63K  <  1/3  is  a  sufficient  condition 
for  exact  reconstruction  of  A'-sparse  signals.  Although  this 
condition  is  weaker  than  the  one  specified  in  Equation  (1), 
we  henceforth  focus  only  on  characterizing  the  performance 
and  complexity  of  the  SP  algorithm  with  respect  to  this 
parameter.  Our  motivation  for  slightly  weakening  this  RIP 
parameter  bound  is  to  simplify  the  notation  used  in  most  of 
the  proofs,  and  to  provide  a  fair  comparison  between  different 
reconstruction  strategies. 

In  order  to  describe  the  main  steps  of  the  SP  algorithm,  we 
introduce  next  the  notion  of  the  projection  of  a  vector  and  its 
residue. 

Definition  3  (Projection  and  Residue):  Let  y  G  Rm  and 
4>j  G  Suppose  that  is  invertible.  The  projection 

of  y  onto  span  (4>j)  is  defined  as 

yP  =  pr°j  (y,  &i)  ■=  M[y, 


where 
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denotes  the  pseudo-inverse  of  the  matrix  <1>  /,  and  *  stands  for 
matrix  transposition. 

The  residue  vector  of  the  projection  equals 
yr  =  resid  (y,  $/)  :=  y  -  yp. 


We  find  the  following  properties  of  projections  and  residues 
of  vectors  useful  for  our  subsequent  derivations. 

Lemma  2  (Projection  and  Residue): 

1)  (Orthogonality  of  the  residue)  For  an  arbitrary  vector 
y  G  Rm,  and  a  sampling  matrix  <l>/  G  RmxK  of  full 
column  rank,  let  yr  =  resid  (y,  T*  / ).  Then 

yr  =  o. 


2)  (Approximation  of  the  projection  residue)  Consider  a 
matrix  3>  G  RmxAr.  Let  J,  J  C  {1,  •  -  -  iV}  be  two 
disjoint  sets,  /  fj  J  =  <j>,  and  suppose  that  <5m+|j-i  <  1. 
Furthermore,  let  y  G  span  (<!?/),  yp  =  proj  (y,  <f> j)  and 
yr  =  resid  (y,  tFj).  Then 


and 


llyp||2  ^ 


(Wm 


llyll 


2  » 


1  ~  5\i\+\j\J 


l|y||2  <  l|yr||2  <  lly||2- 


The  proof  of  Lemma  2  can  be  found  in  Appendix  B. 


TCT 


(a)  Iterations  in  OMP,  Stagewise  OMP,  and  Regularized  OMP:  in  each 
iteration,  one  decides  on  a  reliable  set  of  candidate  indices  to  be  added 
into  the  list  T;  once  a  candidate  is  added,  it  remains  in  the  list  until 
the  algorithm  terminates. 


TCT 


(b)  Iterations  in  the  proposed  Subspace  Pursuit  Algorithm:  a  list  of  K  can¬ 
didates,  which  is  allowed  to  be  updated  during  the  iterations,  is  maintained. 


Figure  1.  Description  of  reconstruction  algorithms  for  /^-sparse  signals: 
though  both  approaches  look  similar,  the  basic  ideas  behind  are  quite  different. 


III.  The  SP  Algorithm 

The  main  steps  of  the  SP  algorithm  can  be  described  as 
follows. 

Algorithm  1  Subspace  Pursuit  Algorithm 

Input:  K,  «l>,  y 

Initialization: 

T  =  {K  indices  corresponding  to  the  largest  abso¬ 
lute  values  of  3>*y}. 

yr  =  resid  (y  ,&f). 

Iteration: 

If  yr  =  0,  quit  the  iteration;  otherwise  continue. 

T'  -  T  (J  { K  indices  corresponding  to  the  largest 
magnitudes  of  <&*yr}. 

Let  x',  =  $^,y. 

T  =  { K  indices  corresponding  to  the  largest  ele¬ 
ments  of  x/}. 

yr  =  resid  (y,  3>f) . 

If  llyrll  >  ||yr||,  quit  the  iteration;  otherwise,  let 
T  =  T  and  y,  =  yr,  and  continue  with  a  new 
iteration. 

Output: 

The  estimated  signal  x  satisfies  x^  n}-t  =  ^ 
and  xT  =  y. 


A  schematic  diagram  of  the  SP  algorithm  is  depicted  in 
Fig.  1(b).  For  comparison,  a  diagram  of  OMP-type  methods  is 
also  provided  in  Fig.  1(a).  The  subtle,  but  important,  difference 


between  the  two  schemes  lies  in  the  approach  used  to  generate 
T,  the  estimate  of  the  correct  support  set  T.  In  OMP  strategies, 
during  each  iteration  one  decides  the  algorithm  selects  one  or 
several  indices  that  represent  good  partial  support  set  estimates 
and  adds  them  to  T.  Once  an  index  is  added  into  T,  it  remains 
in  this  set  throughout  the  remainder  of  the  process.  As  a  result, 
strict  inclusion  rules  are  needed  to  ensure  that  a  significant 
fraction  of  the  newly  added  indices  belongs  to  the  correct 
support  T.  On  the  other  hand,  in  the  SP  algorithm,  an  estimate 
T  of  size  K  is  maintained  and  refined  during  each  iteration.  An 
index,  which  is  considered  reliable  in  some  iteration  but  shown 
to  be  wrong  at  a  later  iteration,  can  be  added  into  or  removed 
from  the  estimated  support  set  freely.  The  expectation  is  that 
the  recursive  refinements  of  the  estimate  of  the  support  set 
will  lead  to  subspaces  with  strictly  decreasing  distance  from 
the  measurement  vector  y. 

We  performed  extensive  computer  simulations  in  order  to 
compare  the  accuracy  of  different  reconstruction  algorithms 
empirically.  In  the  compressive  sensing  framework,  all  sparse 
signals  are  expected  to  be  exactly  reconstructed  as  long  as  the 
level  of  the  sparsity  is  below  a  certain  threshold.  For  empirical 
testing,  we  adopt  the  simulation  strategy  described  in  [6]  for 
simulating  the  empirical  frequency  of  exact  reconstruction. 
The  steps  of  the  testing  strategy  are  listed  below. 

1)  For  given  values  of  the  parameters  m  and  N,  choose  a 
signal  sparsity  level  K  such  that  K  <  m/2; 

2)  Randomly  generate  a  m .  x  N  sampling  matrix  <1>  from 
the  standard  i.i.d.  Gaussian  ensemble; 

3)  Select  a  support  set  T  of  size  |Tj  =  K  uniformly  at 
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random,  and  generate  the  sparse  signal  vector  x  by  either 
one  of  the  following  two  methods: 

a)  Draw  the  elements  of  the  vector  x  restricted  to  T 
from  the  standard  Gaussian  distribution;  we  refer 
to  this  type  of  signal  as  a  Gaussian  signal.  Or, 

b)  set  all  entries  of  x  supported  on  T  to  ones;  we 
refer  to  this  type  of  signal  as  zero-one  signal. 

Note  that  zero-one  sparse  signals  are  of  spatial  interest 
for  the  comparative  study,  since  they  represent  a  partic¬ 
ularly  challenging  case  for  OMP-type  of  reconstruction 
strategies. 

4)  Compute  the  measurement  y  =  <bx,  apply  a  reconstruc¬ 
tion  algorithm  to  obtain  an  estimate  of  x,  x,  and  compare 
x  to  x; 

5)  Repeat  the  process  500  times  for  each  K,  and  then 
simulate  the  same  algorithm  for  different  values  of  m 
and  N. 

The  improved  reconstruction  capability  of  the  SP  method, 
compared  to  that  of  the  OMP  and  ROMP  algorithms,  is 
illustrated  by  two  examples  shown  in  Fig.  2.  Here,  the  signals 
are  drawn  both  according  to  the  Gaussian  and  zero-one  model, 
and  the  benchmark  performance  of  the  LP  reconstruction 
technique  is  plotted  as  well. 

Figure  2  depicts  the  empirical  frequency  of  exact  reconstruc¬ 
tion.  The  numerical  values  on  the  a>axis  denote  the  sparsity 
level  K,  while  the  numerical  values  on  the  y-axis  represent 
the  fraction  of  exactly  recovered  test  signals.  Of  particular 
interest  is  the  sparsity  level  at  which  the  recovery  rate  drops 
below  100%  -  i.e.  the  critical  sparsity  -  which,  when  exceeded, 
leads  to  errors  in  the  reconstruction  algorithm  applied  to  some 
of  the  signals  from  the  given  class. 

The  simulation  results  reveal  that  the  critical  sparsity  of 
the  SP  algorithm  by  far  exceeds  that  of  the  OMP  and  ROMP 
techniques,  for  both  Gaussian  and  zero-one  inputs.  The  re¬ 
construction  capability  of  the  SP  algorithm  is  comparable  to 
that  of  the  LP  based  approach:  the  SP  algorithm  has  a  slightly 
higher  critical  sparsity  for  Gaussian  signals,  but  also  a  slightly 
lower  critical  sparsity  for  zero-one  signals.  However,  the  SP 
algorithms  significantly  outperforms  the  LP  method  when 
it  comes  to  reconstruction  complexity.  As  we  analytically 
demonstrate  in  the  exposition  to  follow,  the  reconstruction 
complexity  of  the  SP  algorithm  for  both  Gaussian  and  zero- 
one  sparse  signals  is  O  [mN  log  K).  At  the  same  time,  the 
complexity  of  LP  algorithms  based  on  interior  point  methods 
is  O  ( m2N 3/2)  [14], 

IV.  Recovery  of  Sparse  Signal 

For  simplicity,  we  start  by  analyzing  the  reconstruction 
performance  of  SP  algorithms  applied  to  sparse  signals  in 
the  noiseless  setting.  The  techniques  used  in  this  context,  and 
the  insights  obtained  are  also  applicable  to  the  analysis  of 
SP  reconstruction  schemes  with  signal  or/and  measurement 
perturbations. 

A  sufficient  condition  for  exact  reconstruction  of  arbitrary 
sparse  signals  is  stated  in  the  following  theorem. 

Theorem  1:  Let  x  €  R N  be  a  A-sparse  signal,  and  let 
its  corresponding  measurement  be  y  =  3>x  €  R'" .  If  the 


Reconstruction  Rate  (500  Realizations):  m=128,  N=256 


(a)  Simulations  for  Gaussian  sparse  signals:  OMP  and  ROMP  start  to  fail 
when  K  >  19  and  when  K  >  22  respectively,  ^i-LP  begins  to  fail  when 
K  >  35,  and  the  SP  algorithm  fails  only  when  K  >  45. 


Reconstruction  Rate  (500  Realizations):  m=128,  N=256 


(b)  Simulations  for  zero-one  sparse  signals:  both  OMP  and  ROMP  starts  to 
fail  when  K  >  10,  begins  to  fail  when  K  >  35,  and  the  SP  algorithm 
fails  when  K  >  29. 

Figure  2.  Simulations  of  the  exact  recovery  rate:  compared  to  OMPs,  the 
SP  algorithm  has  significantly  larger  critical  sparsity. 

sampling  matrix  3>  satisfies  the  RIP  with  parameter 

S3K  <  0.06,  (2) 

then  the  SP  algorithm  is  guaranteed  to  exactly  recover  x  from 
y  via  a  finite  number  of  iterations. 

This  sufficient  condition  is  proved  by  applying  Theorems  2 
and  6.  The  computational  complexity  is  related  to  the  number 
of  iterations  required  for  exact  reconstruction,  and  discussed 
at  the  end  of  Section  IV-C.  Before  we  go  to  the  details,  let  us 
sketch  the  main  ideas  behind  the  proof. 

As  before,  denote  the  estimate  of  supp  (x)  at  the  beginning 
of  a  given  iteration  by  T,  and  the  estimate  of  the  support  set  at 
the  end  of  the  iteration  by  T,  which  also  serves  as  the  estimate 
for  the  next  iteration.  Let 

Xq  =  xT-f  anC^  Xq  =  X T_f  ■ 
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T 


Figure  3.  Illustration  of  sets  and  signal  coefficient  vectors 


Figure  4.  After  each  iteration,  a  /^-dimensional  hyper-plane  closer  to  y  is 
obtained. 


The  vectors  xo  and  xo  represent  the  residual  signals  based 
upon  the  estimates  of  supp(x)  before  and  after  a  given  iteration 
of  the  SP  algorithm  is  completed,  respectively  (see  Fig.  3  for 
illustration).  Provided  that  the  sampling  matrix  <1?  satisfies  the 
RIP  with  constant  (2),  it  holds  that 

ll*oll2  <  ll*oll2, 

which  implies  that  at  each  iteration,  the  SP  algorithm  identifies 
a  A' -dimensional  space  that  reduces  the  reconstruction  error 
of  the  vector  x.  See  Fig.  4  for  an  illustration.  We  are  now 
ready  to  formally  state  this  observation  as  follows. 

Theorem  2:  Assume  that  the  conditions  of  Theorem  1  hold. 
For  each  iteration  of  the  SP  algorithm,  it  holds  that 


be  the  residue  signal  coefficient  vector  corresponding  to  the 
support  set  estimate  T' . 

To  proceed,  we  need  the  following  two  theorems. 

Theorem  3:  It  holds  that 


<,ll2< 


x/10  52k 

1  +  #2  K 


11*0 


2  ’ 


The  proof  of  the  theorem  is  postponed  to  Appendix  D. 
Theorem  4:  The  following  inequality  is  valid 


ll*o||2  < 


1  +  &3K 

1  —  S3  K 


2  • 


The  proof  of  the  result  is  deferred  to  Appendix  E. 

Based  on  Theorems  3  and  4,  one  arrives  at  the  result  claimed 
in  Equation  (3). 

Furthermore,  according  to  Lemmas  1  and  2,  we  have 


||yr||2  =  ||resid(y,$f)||2 

<  ||‘&T_j.Xo||2 

<  (1  +  S3k)  ck  ||x0||2  , 


and 


=  llresid  (y.*f)l|2 

^  1  —  2 S3k  ||^  „  || 

-  1-S3K 
>  (1  -  2 S3K)  || x0 1| 2  . 


Upon  combining  the  two  inequalities  described  above,  we 
obtain  the  following  upper  bound 

||yr||2  <  l  +^K  Ck  ||yr||2  • 

i  —  Z03k 

Finally,  elementary  calculations  show  that  when  ((3 k  <  0.06, 

1  +  S3k  , 

<  *■ 

which  completes  the  proof  of  Theorem  2. 


||x0||2  <  ck||x0||2,  (3) 

and 

\\fr\\2  <  ||yr||2  <  ||yr||2,  (4) 

where 

a/10  S3k 
Ck  =  i - x — ' 

To  prove  Theorem  2,  we  need  to  take  a  closer  look  at  the 
operations  executed  during  each  iteration  of  the  SP  algorithm. 
During  one  iteration,  two  basic  sets  of  computations  and  com¬ 
parisons  are  performed:  first,  given  T,  K  additional  candidate 
indices  for  inclusion  into  the  estimate  of  the  support  set  are 
identified;  and  second,  given  A',  K  reliable  indices  out  of  the 
total  2K  indices  are  selected  for  future  testing.  This  set  of 
candidate  indices  is  represented  by  T.  In  Subsections  IV-A 
and  IV-B,  we  provide  the  intuition  for  choosing  to  perform 
SP  support  reconstruction  according  to  these  rules.  Now,  let 


A.  Why  Does  Correlation  Maximization  Work  for  the  SP 
Algorithm? 

Both  in  the  initialization  step  and  during  each  iteration 
of  the  SP  algorithm,  we  select  K  indices  that  maximize 
the  correlations  between  the  column  vectors  and  the  residual 
measurement.  Henceforth,  this  step  is  referred  to  as  correlation 
maximization  (CM).  Consider  the  ideal  case  where  all  columns 
of  $  are  orthogonal2.  In  this  scenario,  the  signal  coefficients 
can  be  easily  recovered  by  calculating  the  correlations  (v, .  y)  - 
i.e.,  all  indices  with  non-zero  magnitude  are  in  the  correct  sup¬ 
port  of  the  sensed  vector.  Now  assume  that  the  sampling  matrix 
«t>  satisfies  the  RIP.  Recall  that  the  RIP  (see  Lemma  1)  implies 
that  the  columns  are  locally  near-orthogonal.  Consequently, 
for  any  j  not  in  the  correct  support,  the  magnitude  of  the 
correlation  (v7 ,  y)  is  expected  to  be  small,  and  more  precisely, 
upper  bounded  by  5k+ 1  ||x||2.  This  seems  to  provide  a  very 
simple  intuition  why  correlation  maximization  allows  for  exact 
reconstruction,  but  the  correct  problems  in  reconstruction  arise 


x'o  =  xt-t' 


2Of  course,  in  this  case  no  compression  is  possible. 
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Figure  5.  Correlation  maximization  works  in  the  SP  setting. 


due  to  the  following  fact.  Although  it  is  clear  that  for  all  j  (j  T, 
the  values  of  |(vj,y)|  are  upper  bounded  by  8k+i  ||x||,  it  may 
also  happen  that  for  all  i  £  T,  the  values  of  |(vi;y)|  are  small 
as  well.  Dealing  with  order  statistics  in  this  scenario  cannot 
be  immediately  proved  to  be  a  good  reconstruction  strategy. 
The  following  example  illustrates  this  point. 

Example  1:  Without  loss  of  generality,  let 
T  =  {1,  •  •  •  ,K}.  Let  the  vectors  v,;  (i  £  T)  be  orthonormal, 
and  let  the  remaining  columns  v7-,  j  £T,  of  <t>  be  constructed 
randomly,  using  i.i.d.  Gaussian  samples.  Consider  the 
following  normalized  zero-one  sparse  signal. 


y 


l 

Vk 


5> 

iST 


Then,  for  K  sufficiently  large, 

|(vj, y)|  =  <  1,  for  all  1  <  i  <  K. 

It  is  straightforward  to  envision  the  existence  of  a  j  ^  T  such 
that 


l(vj,y)|  «  6k+ i  > 


1 

4k' 


The  latter  inequality  is  critical,  because  achieving  very  small 
values  for  the  RIP  parameter  is  a  challenging  task. 

This  example  represents  a  particularly  challenging  case  for 
the  OMP  algorithm.  Therefore,  one  of  the  major  constraints 
imposed  on  the  OMP  algorithm  is  the  requirement  that 

max  | (v $ ,  y ) |  =  -L  >  max  |(Vj,y)|  «  SK+i- 

V  K  3Tt 


To  meet  this  requirement,  8k+ i  has  to  be  less  than  1  /y/K, 
which  decays  fast  as  K  increases. 

In  contrast,  the  SP  algorithm  allows  for  some  j  ^  T  to  be 
such  that 

max  |(vj,y)|  <  |(vj,y)| . 


As  long  as  Equation  (2)  holds,  the  indices  in  the  correct 
support  of  x,  which  account  for  the  most  significant  part  of 
the  energy  of  the  signal,  are  captured  by  the  CM  procedure. 
Detailed  descriptions  of  how  this  can  be  achieved  are  provided 
in  the  proofs  of  the  previously  stated  Theorems  5  and  3. 


Let  us  first  focus  on  the  initialization  step.  By  the  definition 
of  the  set  T  in  the  initialization  stage  of  the  algorithm,  the  set 
of  the  K  selected  columns  ensures  that 

||$fy||2  =  l(w,y)|2  >  (i  -$2k)  ||x||2.  (5) 

This  is  a  consequence  of  the  result  of  Theorem  5.  Now,  if  we 
assume  that  the  estimate  T  is  disjoint  from  the  correct  support, 
i.e.,  that  T (~)T  =  </>,  then  by  the  near  orthogonality  property 
of  Lemma  1,  one  has 


||$fy||2  <  <W||x||2. 

The  last  inequality  clearly  contradicts  (5)  whenever  82K  < 
83K  <  1/2.  Consequently, 


and  at  least  one  correct  element  of  the  support  of  x  is  in  the  set 
T.  This  phenomenon  is  depicted  in  Fig.  5  and  quantitatively 
detailed  in  Theorem  5. 

Theorem  5:  After  the  initialization  step,  one  has 


xt  n  t 


> 

2 


1  —  3  82k 

1  +  82  K 


2  ) 


and 


T—T 


< 

12  — 


2  K 


^2K 


1  +  * 


2  K 


The  proof  of  the  theorem  is  postponed  to  Appendix  C. 

To  study  the  effect  of  correlation  maximization  during  each 
iteration,  one  has  to  observe  that  correlation  calculations  are 
performed  with  respect  to  the  vector 

yr  =  resid  (y,  3>T) 

instead  of  being  performed  with  respect  to  the  vector  y. 
As  a  consequence,  to  show  that  the  CM  process  captures  a 
significant  part  of  residual  signal  energy  requires  an  analysis 
including  a  number  of  technical  details.  These  can  be  found 
in  the  Proof  of  Theorem  3. 


B.  Identifying  Indices  Outside  of  the  Correct  Support  Set 

Note  that  there  are  2K  indices  in  the  set  T' ,  among  which 
at  least  K  of  them  do  not  belong  to  the  correct  support  set  T. 
In  order  to  expurgate  those  indices  from  T\  or  equivalently, 
in  order  to  find  a  /f-dimensional  subspace  of  the  space 
span  (<3>t')  closest  to  y,  we  need  to  estimate  these  K  incorrect 
indices. 

Define  AT  =  T'  —  T.  This  set  contains  the  K  indices 
which  are  deemed  incorrect.  If  AT  fj  T  =  <j>,  our  estimate  of 
incorrect  indices  is  perfect.  However,  sometimes  AT  fj  T  yl  <f>. 
This  means  that  among  the  estimated  incorrect  indices,  there 
are  some  candidates  that  actually  belong  to  the  correct  support 
set  T.  The  question  of  interest  is  how  often  these  correct 
indices  are  erroneously  removed  from  the  support  estimate, 
and  how  quickly  the  algorithm  manages  to  restore  them  back. 

First,  we  claim  that  the  reduction  in  the  ||-||2  norm  induced 
by  such  erroneous  expurgation  is  small.  The  intuitive  expla¬ 
nation  for  this  claim  is  as  follows.  Let  us  assume  that  all  the 
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Figure  6.  The  projection  coefficient  vector  xj,  is  a  smeared  version  of  the 
vector  Xy- p|  T, . 


indices  in  the  support  of  x  have  been  successfully  captured,  or 
equivalently,  that  T  C  T'.  When  we  project  y  onto  the  space 
span  ^  can  be  shown  that  its  corresponding  vector  x'( 

satisfies 

Xp  =  XT/, 

and  that  it  contains  at  least  K  zeros.  Consequently,  the  K 
indices  with  smallest  magnitude  -  equal  to  zero  -  are  clearly 
not  in  the  correct  support  set. 

However,  the  situation  changes  when  T  </.  T',  or  equiva¬ 
lently,  when  T  —  T1  ^  </>.  After  the  projection,  one  has 

Xp  7^  xT'. 

The  projection  vector  xj,  can  be  viewed  as  a  smeared  version 
of  xt'  (see  Fig.  6  for  illustration):  the  coefficients  indexed 
by  elements  outside  the  support  of  x  may  become  non-zero; 
the  coefficients  indexed  by  elements  in  the  support  set  T  may 
experience  changes  in  their  magnitudes.  Fortunately,  the  level 
of  this  smear  is  proportional  to  the  norm  of  the  residual  signal 
Xg,  which  can  be  proved  to  be  small  according  to  the  analysis 
accompanying  Theorem  3.  As  long  as  the  smear  is  not  severe, 
the  largest  projection  coefficients  still  serve  as  good  estimates 
of  the  correct  signal  coefficients  restricted  to  T',  and  the 
correct  support  set  T.  This  intuitive  explanation  is  formalized 
in  the  previously  stated  Theorem  5. 


C.  Convergence  of  the  SP  Algorithm 

In  this  subsection,  we  upper  bound  the  number  of  iterations 
needed  to  reconstruct  an  arbitrary  IT-sparse  signal  using  the 
SP  algorithm. 

Given  an  arbitrary  IT-sparse  signal  x,  we  first  arrange  its 
elements  in  decreasing  order  of  magnitude.  Without  loss  of 
generality,  assume  that 

]®i|  >  M  >  •  •  •  >  \xk\  >  o, 


and  that  Xj  =  0,  W  j  >  K.  Define 


Pmin  •  — 


\xk\ 

l|x||2 


min  X; 

1  <i<K 


Let  7T.it  denote  the  number  of  iterations  of  the  SP  algorithm 
needed  for  exact  reconstruction  of  x.  Then  the  following 
theorem  upper  bounds  n;t  in  terms  of  ck  and  Amin-  It  can  be 
viewed  as  a  bound  on  the  complexity/performance  trade-off 
for  the  SP  algorithm. 

Theorem  6:  The  number  of  iterations  of  the  SP  algorithm 
is  upper  bounded  by 


?Tit  <  min 


/  -  log  Amin  1  1-5  ■  I<  \ 
V  -  log  CK  ’  -  log  CK  ) 


This  result  is  a  combination  of  Theorems  7  and  8,  described 
below. 

Theorem  7:  One  has 


«it  < 


—  log  Amin  i 

-  log  CK 


Theorem  8:  It  can  be  shown  that 


n it  < 


1.5  •  K 

-  log  CK  ' 


The  proof  of  Theorem  7  is  intuitively  clear  and  presented 
below,  while  the  proof  of  Theorem  8  is  more  technical  and 
postponed  to  Appendix  F. 

Proof  of  Theorem  7:  This  theorem  is  proved  by  a  contra¬ 
diction.  Let  T  be  the  estimate  of  T  after 

-  log  Ami„ 

-  log  CK 

iterations.  Suppose  that  T  T,  or  equivalently,  T  -  7  f  f. 
Then 


>  min  \xf\  =  Amin  ||x||2  . 

iGT 

However,  according  to  Theorem  2, 

||xT_j.||2  <  (ck)  ||x||  2 

—  CK  Amin  1 1 X 1 1  2  T  Amin  1 1 X 1 1  2  ; 

where  the  last  inequality  follows  from  the  assumption  that 
Ck  <  1-  This  contradiction  completes  the  proof.  ■ 

A  drawback  of  Theorem  7  is  that  it  sometimes  overestimates 
the  number  of  iterations,  especially  when  Amin  ^  1-  The 
example  to  follow  illustrates  this  point. 

Example  2:  Let  K  =  2,  X\  =  210,  £2  =  1,  £3  =  ■  ■  ■  = 
Xn  =  0.  Suppose  that  the  sampling  matrix  $  satisfies  the  RIP 
with 

_  v/10^3 k  _  1 

CK  ~  1-S3K  ~  2‘ 

Noting  that  Amin  2-10,  Theorem  6  implies  that 
nit  <  11. 

Indeed,  if  we  take  a  close  look  at  the  steps  of  the  SP  algorithm, 
we  can  verify  that 

7lit  <  1. 

After  the  initialization  step,  by  Theorem  5,  it  can  be  shown 
that 
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ll*oll2  < 


\J  4i 52k  +  8<5| 


2X 


1  +  5: 


<  cK  ||x||2  < 


2  K 


As  a  result,  the  estimate  T  must  contain  the  index  one  and 
||xo||2  —  1-  After  the  first  iteration,  since 

ll*o II 2  ^  n  ll*o II 2  <  \  <  min  | Si | , 

Z  Z  1 


we  have  T  C  T. 


This  example  suggests  that  the  upper  bound  in  Equation  (7) 
can  be  tightened  when  pm jn  <C  1.  Based  on  the  idea  behind 
this  example,  another  approach  to  upper  bounding  rqt  is 
described  in  Theorem  8  and  its  validity  proved  in  Appendix 
F. 

It  is  clear  that  the  number  of  iterations  required  for  exact 
reconstruction  depends  on  the  values  of  the  entries  of  the 
sparse  signal  itself.  We  therefore  focus  our  attention  on  the 
following  three  particular  classes  of  sparse  signals. 

1)  Zero-one  sparse  signals.  As  explained  before,  zero- 
one  signals  are  in  the  most  challenging  reconstruction 
category  for  the  well-known  OMP  algorithm.  However, 
this  class  of  signals  has  the  best  upper  bound  on  the 
convergence  rate  of  the  SP  algorithm.  Elementary  cal¬ 
culations  reveal  that  pm;n  =  1/sfK  and  that 

log  K 

Uit  ~  2  log(l/ ck)  ' 

2)  Sparse  signals  with  power-law  decaying  entries  ( also 
known  as  compressible  sparse  signals).  Signals  in  this 
category  are  defined  via  the  following  constraint 

|  Xi  |  A  cx  i  p  , 


for  some  constants  cx  >  0  and  p  >  1.  This  type  of 
signals  has  been  widely  considered  in  the  CS  literature, 
since  most  practical  and  naturally  occurring  signals 
belong  to  this  class  [13].  It  follows  from  Theorem  7 
that  in  this  case 


nit  < 


p  log  K 
log(l /cK) 


(l  +  o(l)), 


where  o  (1)  — >  0  when  K  — >  oo. 

3)  Sparse  signals  with  exponentially  decaying  entries.  Sig¬ 
nals  in  this  class  satisfy 


\Xi\  <  Cx 


for  some  constants  cx  >  0  and  p  >  0.  Theorem  6  implies 
that 


tin  < 


1.5  K 

log(l  /ck) 


if  0  <  p  <  1.5 
if  p  >  1.5 


where  again  o  (1)  — »  0  as  K  —t  oo. 

Simulation  results,  shown  in  Fig.  7,  indicate  that  the  above 
analysis  gives  the  right  order  of  growth  in  complexity  with 
respect  to  the  parameter  K.  To  generate  the  plots  of  Fig. 
7,  we  set  m  =  128,  N  =  256,  and  run  simulations  for 
different  classes  of  sparse  signals.  For  each  type  of  sparse 
signal,  we  selected  different  values  for  the  parameter  K,  and 


m=128,  N=256,  #  of  realizations=200 


Figure  7.  Convergence  of  the  subspace  pursuit  algorithm  for  different  signals. 


for  each  I\,  we  selected  200  different  randomly  generated 
Gaussian  sampling  matrices  $  and  as  many  different  support 
sets  T.  The  plots  depict  the  average  number  of  iterations 
versus  the  signal  sparsity  level  K,  and  they  clearly  show  that 
Tin  =  O  (log  (K))  for  zero-one  signals  and  sparse  signals 
with  coefficients  decaying  according  to  a  power  law,  while 
Ti[t  =  O  (K)  for  sparse  signals  with  exponentially  decaying 
coefficients. 

With  the  bound  on  the  number  of  iterations  required  for 
exact  reconstruction,  the  computational  complexity  of  the 
complete  SP  algorithm  can  be  easily  estimated.  In  each 
iteration,  CM  requires  mN  computations,  while  the  projec¬ 
tions  can  be  computed  with  marginal  cost  O  (Km)  by  the 
Modified  Gram-Schmidt  (MGS)  algorithm  [15].  Therefore, 
the  total  complexity  of  the  SP  algorithm  is  O  (rriN  log  K) 
for  compressible  sparse  signals,  and  it  is  upper  bounded  by 
O  (mNK)  for  arbitrary  sparse  signals. 

The  complexity  of  the  SP  algorithm  is  comparable  to  that  of 
OMP-type  algorithms.  For  the  standard  OMP  algorithm,  exact 
reconstruction  always  requires  K  iterations.  The  correspond¬ 
ing  complexity  is  O  ( KmN ).  For  the  ROMP  and  StOMP  algo¬ 
rithms,  the  challenging  signals  in  terms  of  convergence  rate  are 
the  sparse  signals  with  exponentially  decaying  entries.  When 
p  is  sufficiently  large,  it  can  be  shown  that  both  ROMP  and 
StOMP  also  need  O  (K)  iterations  for  reconstruction,  which 
implies  computational  complexity  of  the  order  of  O  (KmN). 

One  advantage  of  the  SP  algorithm  is  that  the  complexity  is 
reduced  to  O  (mN  log  K)  when  compressible  sparse  signals 
are  considered.  For  this  class  of  sparse  signals,  to  the  best  of 
the  author’s  knowledge,  there  is  no  known  formal  proof  that 
allows  one  to  bound  the  complexity  of  the  ROMP  and  StOMP 
algorithm. 

V.  Recovery  of  Approximately  Sparse  Signals 
from  Inaccurate  Measurements 

We  consider  first  a  sampling  scenario  in  which  the  signal 
x  is  /\ -sparse,  but  the  measurement  vector  y  is  subjected  to 
an  additive  noise  component,  e.  The  following  theorem  gives 
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a  sufficient  condition  for  convergence  of  the  SP  algorithm  in 
terms  of  the  RIP  parameter  S3  k  ,  as  well  as  an  upper  bounds  on 
the  recovery  distortion  that  depends  on  the  energy  (Z2-norm) 
of  the  error  vector  e. 

Theorem  9  (Stability  under  measurement  perturbations): 
Let  x  £  Rn  be  such  that  |supp(x)|  <  K,  and  let  its 
corresponding  measurement  be  y  =  4>x  +  e,  where  e  denotes 
the  noise  vector.  Suppose  that  the  sampling  matrix  satisfies 
the  RIP  with  parameter 

S3K  <  0.03.  (6) 

Then  the  reconstruction  distortion  of  the  SP  algorithm  satisfies 

||x-x||2  <  cK  || e||2  , 

where 

/  _  1  +  #3  K 

K  63 K  (1  —  S3k) 

The  proof  of  this  theorem  is  sketched  in  Section  V-A. 

We  also  study  the  case  where  the  signal  x  is  only  approx¬ 
imately  /f -sparse,  and  the  measurements  y  is  contaminated 
by  a  noise  vector  e.  To  simplify  the  notation,  we  henceforth 
denote  by  xk  the  vector  obtained  from  x  by  maintaining  the 
I\  entries  with  largest  magnitude  and  setting  all  other  entries 
in  the  vector  to  zero.  In  this  setting,  a  signal  x  is  said  to  be 
approximately  A’-sparse  if  x  —  xk  ^  0.  Based  on  Theorem 
9,  we  can  upper  bound  the  recovery  distortion  in  terms  of  the 
(1  and  £2  norms  of  x  —  xk  and  e,  respectively,  as  follows. 

Corollary  1:  ( Stability  under  signal  and  measurement  per¬ 
turbations)  Let  x  £  R-^  be  approximately  /\-sparse,  and  let 
y  =  <t>x  +  e.  Suppose  that  the  sampling  matrix  satisfies  the 
RIP  with  parameter 

Sqk  <  0.03. 

Then 


Figure  8.  Reconstruction  distortion  under  signal  or  measurement  perturba¬ 
tions:  both  perturbation  level  and  reconstruction  distortion  are  described  via 
the  i 2  norm. 

and  covariance  matrix  cr/Im,  where  Im  denotes  the  m  x 

to  identity  matrix. 

We  execute  the  SP  decoding  reconstruction  process  on  y,  500 
times  for  each  K,  of  and  of  .  The  reconstruction  distortion 
||x  —  x||2  is  obtained  via  averaging  over  all  these  instances, 
and  the  results  are  plotted  in  Fig.  8.  Consistent  with  the 
findings  of  Theorem  9  and  Corollary  1,  we  observe  that  the 
recovery  distortion  increases  linearly  with  the  f2-norm  of 
measurement  errors.  Even  more  encouraging  is  the  fact  that  the 
empirical  reconstruction  distortion  is  typically  much  smaller 
than  the  corresponding  upper  bounds.  This  is  likely  due  to 
the  fact  that,  in  order  to  simplify  the  expressions  involved, 
many  constants  and  parameters  used  in  the  proof  were  upper 
bounded. 


x  -  x 


^  C2K 


1  +  Sqk 


K 


The  proof  of  this  corollary  is  given  in  Section  V-B. 

Theorem  9  and  Corollary  1  provide  analytical  upper  bounds 
on  the  reconstruction  distortion  of  the  noisy  SP  version  of 
the  SP  algorithm.  In  addition  to  these  theoretical  bounds,  we 
performed  numerical  simulations  to  empirically  estimate  the 
reconstruction  distortion.  In  the  simulations,  we  first  select 
the  dimension  of  the  signal  x  to  N,  and  the  number  of 
measurements  to.  We  then  choose  a  sparsity  level  AT  such 
that  K  <  to/2.  Once  the  parameters  are  chosen,  an  to  x  iV 
sampling  matrix  with  standard  i.i.d.  Gaussian  entries  is  gen¬ 
erated.  For  a  given  K,  the  support  set  T  of  size  |T|  =  AT 
is  selected  uniformly  at  random.  A  zero-one  sparse  signal  is 
constructed  as  in  the  previous  section.  Finally,  either  signal  or 
a  measurement  perturbations  are  added  as  follows: 

1)  Signal  perturbations :  the  signal  entries  on  T  are  kept 
unchanged  but  the  signal  entries  out  of  T  are  perturbed 
by  i.i.d.  Gaussian  A/”(0,cr^)  samples. 

2)  Measurement  perturbations-,  the  perturbation  vector  e  is 
generated  from  a  Gaussian  distribution  with  zero  mean 


A.  Recovery  Distortion  under  Measurement  Perturbations 

The  first  step  towards  proving  Theorem  9  is  to  upper  bound 
the  reconstruction  error  for  a  given  estimated  support  set  T, 
as  succinctly  described  in  the  lemma  to  follow. 

Lemma  3:  Let  x  £  be  a  A'-sparse  vector,  ||x||0  <  K, 
and  let  y  =  3>x  +  e  be  a  measurement  for  which  <1?  £  Rmx  N 
satisfies  the  RIP  with  parameter  6k-  For  an  arbitrary  T  C 


{ 1 ,  -  •  •  ,  N}  such  that 


T 


<  K,  define  x  as 


Xf  =  $ty, 


and 


Then 


X{1,—  ,JV}-T  —  0- 


lx  -  x|L  <  - — K —  l|xo||o  +  1  +  t 3K  lien 


1  -8- 


3K 


l-$ 


3  K 


The  proof  of  the  lemma  is  given  in  Appendix  G. 

Next,  we  need  to  upper  bound  the  norm  ||xo||2-  To  achieve 
this  task,  we  describe  in  the  theorem  to  follow  how  ||xo||2 
depends  on  the  noise  energy  ||e||2. 
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Theorem  10:  Let  xo  =  xT_f,  Xg  =  xT_^-j,|j  T,^  and  Xo  = 
xT_T.  Suppose  that 

l|e||2  <  ||x0||2  ■  (7) 

1  °2K 

Then 

iixoii2  <  ii*oii2’  (8) 

and 

Furthermore,  if 

£3 k  <  0.03, 

one  has 

l|yr||2  <  ||yr||2- 

Proof:  The  upper  bounds  in  Inequalities  (8)  and  (9) 
are  proved  in  Appendix  H  and  I  respectively.  To  complete 
the  proof,  we  make  use  of  Lemma  2  stated  in  Section  II. 
According  to  this  lemma,  we  have 


B.  Recovery  Distortion  under  Signal  and  Measurement  Per¬ 
turbations 

The  proof  of  Corollary  1  is  based  on  the  following  two 
lemmas,  which  are  proved  in  [16]  and  [17],  respectively. 

Lemma  4:  Suppose  that  the  sampling  matrix  $  €  U.mxN 
satisfies  the  RIP  with  parameter  8k-  Then,  for  every  x  £ 
one  has 

ll$x||2  <  s/1  +  SK  (j|x||2  +  ^  llxll^  . 


Lemma  5:  Let  x  €  R“  be  A'-sparse,  and  let  x^  denote  the 
vector  obtained  from  x  by  keeping  its  K  entries  of  largest 
magnitude,  and  by  setting  all  its  other  components  to  zero. 

Th“  11,11. 

< 


||x-xA'||2  S 


2  sjK' 


To  prove  the  corollary,  consider  the  measurement  vector 
y  =  $x  +  e 

=  $x2A  +  $  (x  -  x2A)  +  e. 

By  Theorem  9,  one  has 

||x  -  x2A||2  <  C6k  (||$  (x  -  x2A)||2  +  ||e||2) , 


||yr||2  =  ||resid(y,$f)||2 

—  ||$T-TXT-t||2  I e II 2 

<  (1  +  5 3k)  I|x0||2  +  || e|| 2 

<  ^(1  +  <53A)  c'k  +  ^  _  ^2  ^  ||xo||2  , 


||yr||2  =  1 1 resid  (y,$f)||2 

-  \-Ssk  (H#tx°||  ~llell2) 

>  (i-2S3K  -j^;)  II*oIL  ■ 

Elementary  calculation  reveal  that  as  long  as  (>3A  <  0.03,  we 
have  ||yr||  <  ||yr||.  This  completes  the  proof  of  the  theorem. 


Based  on  Theorem  10,  we  conclude  that  when  the  SP 
algorithm  terminates,  the  inequality  (7)  is  violated  and  we 
must  have 

IMI2  >  -I  ^2  l|xo||2  ■ 

1  °3  K 

Under  this  assumption,  it  follows  from  Lemma  3  that 


|x-x||2  < 


1 


1  $3K 


1  —  83K  83K 

1  +  83  K  | 

83 K  (1  —  83k) 


1  +  83  K 
1  —  83K 


which  completes  the  proof  of  Theorem  9. 


||$(x-x2A)||2 


<  \A  +  6qk  ^I|x  -  x2A||2  +  - — 

Furthermore,  Lemma  5  implies  that 

l|x  -  x2A||2  =  || (x  -  xK)  -  (x  -  xk)k ||2 


< 


Therefore, 


2VK 


l|x-xA||i 


ll$(x-x2A)||2 


||x  -  X2A||1' 

V6K  , 


and 


|x  —  x2A||2  <  c2A  ^||e||2  +  s/l  +  8Sk -  y— 

which  completes  the  proof. 


VI.  Conclusion 

We  introduced  a  new  algorithm,  termed  subspace  pursuit, 
for  low-complexity  recovery  of  sparse  signals  sampled  by  ma¬ 
trices  satisfying  the  RIP  with  a  constant  parameter  <53A.  Also 
presented  were  simulation  results  demonstrating  that  the  re¬ 
covery  performance  of  the  algorithm  matches,  and  sometimes 
even  exceeds,  that  of  the  LP  programming  technique;  and, 
simulations  showing  that  the  number  of  iterations  executed 
by  the  algorithm  for  zero-one  sparse  signals  and  compressible 
signals  is  of  the  order  0(log  AT). 
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Appendix 

We  provide  next  detailed  proofs  for  the  lemmas  and  theo¬ 
rems  stated  in  the  paper. 

A.  Proof  of  Lemma  1 

1)  The  first  part  of  the  lemma  follows  directly  from  the 
definition  of  6K .  Every  vector  q  £  if  can  be  extended 
to  a  vector  q'  £  Rh  by  attaching  K'  —  K  zeros  to  it. 
From  the  fact  that  for  all  J  C  {I,--  -  ,7V}  such  that 
|  J\  <  K' ,  and  all  q'  £  WLK' ,  one  has 


(i  -  <M  llq'llz  <  ll^./q'lla  <  (i  +  llq'"2 


2  ’ 


it  follows  that 


(i  -  8k>)  ||q||a  <  Il*/q|l2  <  (i  +  &k>)  ||q||| 

for  all  |/|  <  I\  and  q  £  if.  Since  5  k  is  defined  as 
the  infimum  of  all  parameter  <5  that  satisfy  the  above 
relationship,  5k  <  5 k’  ■ 

2)  The  inequality 

|(<I>/a,  <frjb}|  <  <5|/|+|j|  || a|| 2  ||b||2 

obviously  holds  if  either  one  of  the  norms  ||a|j2  and 
||b||2  is  zero.  Assume  therefore  that  neither  one  of  them 
is  zero,  and  define 

a'  =  a/l!all2»b'  =  b/llbll2; 

x'  =  <t>/a,  y'  =  <t>jb. 

Note  that  the  RIP  implies  that 


2(1-<5|/|+|j|)  <  ||x'- 


/ 1|  2 

•y  ll2 


a 

b' 


and  similarly. 


2(1-<5|/|+|J|)  < 


a 

-b' 


<  2  (1  +  (5|/|+|j|) 


x  —  y  ll2 

2 

<  2(1  +  <Vi+|j|) 


(10) 


We  thus  have 


/„/  v/\  /  Ilx,  +  y1l2-llx'-y'll2  /  s 

(x  >y  /  s - ^ - <  0|/|+|j|, 


B.  Proof  of  Lemma  2 

1)  The  first  claim  is  proved  by  observing  that 

**,yr  =  (y  -  (*;*i)_1  $*! y 

=  $}y  -  =  0. 

2)  To  prove  the  second  part  of  the  lemma,  let 

yp  =  3>/xp,  and  y  =  3>jx. 

By  Lemma  1,  we  have 

HxJL  Hx||  2 

ll-d, 

Il2 


l(yP»y)l  —  ^i/[#iji  n^pii2  nAii2 

l|yp||2  ny  1 


<  <5|/|+|J| 


V1  -  <Vi  y/1  -  S\P 


<-<Wl  l|yP||2  l|y||2  • 


i-$ 


Since 


we  have 


\iMA 


(yP,y)  =  (yP,yP  +  yr)  =  ||yP||2, 


l|yP II 2  ^  nyiia  ■ 


m+Ni 

Furthermore,  since 

l|yr||2  =  l|y-yPll2  >  lly||2-  llypll2 

and  since 

l|yr||2  =  lly-ypll2  ^  lly||2  +  llypll2> 

one  can  show  that 

1  jm+Ni  <  llyrll  <  1  |  s\i\+\j\ 


1  ~5\ 

Observing  that 


\L+\J\ 


l-5i 


FI+NI 


I|y,ll2  +  llypll2  =  lly"2 


2  ’ 


we  finally  show  that 

<Vl+NI 


1  - 


l-5i 


f+NI 


|y||2  <  llyr||2  <  lly||2- 


C.  Proof  of  Theorem  5 

The  first  step  consists  in  proving  Inequality  (5),  which  reads 


-  (x',y')  < 

and  therefore 

K$/a,*jb)| 


|x'  -  y  2  -  ||x' 


y'llz 


NU^ 


=  l(x',y')l  <  <5|/|+|J|- 


Now, 


||^fy||2  >  (i-<W)||x||2. 
By  assumption,  |T|  <  K,  so  that 

||^y||2  =  ||$^TX||2>(l-^) 


12  > 


which  provides  the  desired  proof.  According  to  the  definition 

off, 


«W>||2  = 


max  ||q*  ($}$jb)||2 
q:  llq|l2=1 


2IN2 


11+01 

which  completes  the  proof. 


-  H*  /lJl+|JI  Hql 

q:  llq|l  1=1 
=  5, 


12  > 


l$fyll2  =  |^yEi<v-y>i2 

>  ||^y||2  >  (1  -  52K)\ 


12  ■ 


The  second  step  is  to  partition  the  estimate  of  the  support 
set  f  into  two  subsets:  the  set  Tf]T,  containing  the  indices 
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in  the  correct  support  set,  and  T  —  T,  the  set  of  incorrectly 
selected  indices.  Then 


< 

TJ  2  — 


< 


n  Ty 

n 


$ 


jp _ rpj 


+  $2K  ||x|!2  , 


where  the  last  inequality  follows  from  the  near-orthogonality 
property  of  Lemma  1. 

Furthermore, 


s  * 

\T'  =  f(jT" 

T" 

T 

* -  *o - * 

T  —  T' 

^ _  Xn 

"“X'c— * 
Tf\T" 

T-t 

Figure  9.  Illustration  of  sets  and  signal  coefficient  vectors  for  Theorem  3 


n  Ty 


< 


n  r^fnTxff|T 


+ 

<  (1  +  S'2k) 


**.  *  -X 

Tf|  T  T—T  T—T 


'’T  [~|  T 


+  &2K  llxll 


Combining  the  two  inequalities  above,  one  can  show  that 


<  (1  +  S2K) 


n  t 


+  2(52 K  Ilxl 


By  invoking  Inequality  (5)  it  follows  that 


Xq  =  tct-T',  is  the  part  of  the  signal  not  captured  by 
V. 

For  clarity,  some  of  the  sets  and  vectors  in  the  list  above  are 
depicted  in  Fig.  9  . 

With  the  above  notation,  the  main  step  of  the  proof  is  to 
show  that  CM  allows  for  capturing  a  significant  part  of  the 
residual  signal  power,  that  is, 

llxoll2  <  Cl  ||xo||2 


(1  -  S2K )  Hx||2  <  (1  +  &2k) 

Hence, 


^Tf]T 


-  2(5 2I<  Ilxll2  • 


^  1  -  362K  ||  || 
^nr||2  ^  1  +  §2K  x  ^ 

To  complete  the  proof,  we  observe  that 


T—T  | 


=  \  x  o  - 


sf  n  t 


< 


\/8S2k  +  4(5| 


2K 


1  +  S2k 


D.  Proof  of  Theorem  3 

The  proof  of  this  theorem  heavily  relies  on  the  following 
technical  (and  tedious)  notation,  some  of  which  has  been 
previously  described  in  the  paper,  but  is  repeated  in  this  section 
for  completeness: 

yr  =  resid  (y.d?^),  denotes  the  residue  of  the  projec¬ 
tion  of  y  onto  the  space  span  (3>f); 

xr  is  the  coefficient  vector  corresponding  to  yr,  i.e., 
Yr  =  ^T|j  fxri 

y0  =  T’y.  TxT  T,  is  the  component  of  the  measure¬ 
ment  which  has  not  been  captured  by  the  set  T; 

x0  =  xT  T,  denotes  the  part  of  the  signal  not  captured 
by  f; 

yo.p  =  proj  (yo,  &f)  denotes  the  projection  of  yo  onto 
span  ); 

xoj,  is  used  to  denote  the  projection  coefficient  vector 
corresponding  to  y0,P,  i-e.,  yo,P  =  3>fx o,P; 

T"  denotes  the  set  of  K  residual  indices  with  maximum 
correlation  magnitudes  |(vj,yr)|; 

y'c  =  $T|-|T"Xrp|  T"  denotes  the  component  of  the 
measured  vector  included  through  the  set  T" . 

x!c  =  X7,|'-|7'//,  denotes  part  of  the  sample  signal  sup¬ 
ported  on  T" . 

y'0  =  $t-T'xt-T',  corresponds  to  the  part  of  the 

measurement  vector  not  captured  by  T'  =  T  (J  T" . 


for  some  constant  C\.  Note  that  x0  is  composed  of  xj,  and  x'c, 
i.e., 

X0  =  [(Xq)*  ,(x'c)*]% 


so  that 


|xol|2  =  ||xo||2Hlx,J2. 


The  most  difficult  part  of  our  demonstration  is  to  upper  bound 

Itella- 

The  roadmap  of  the  proof  can  be  formed  by  establishing 
the  validity  of  the  following  four  claims. 

1)  If  we  write 


then 


where 


^T|JT  —  ’ 

yr  =  rpXf  . 

Xr  =  [xS,-xS,p]*  • 


We  claim  that 


Hxo,p||2  <  j-%—  llxo||2- 

-1-  —  02  K 

2)  It  holds  that 

||^T"yr||2  >  (1  -  2 S2k)  ||xo||2  ■ 

3)  The  corresponding  upper  bound  reads  as 

ll*r»yr||2  <  (1  +  <W)  ||X'J2  +  ||*o||2  . 


1  —  (5  2K 


4)  Finally, 


llxoll2  < 


VW5: 


2K 


1  +  <5  2K 


lxol 


2  ■ 


Proof:  The  claims  can  be  established  as  demonstrated 
below. 
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1)  It  is  clear  that 

yr  =  resicl  (y,  &t)  =  resid  (y0, 


4)  Combining  the  second  and  the  third  claims  of  the  proof, 
we  find  that 


=  y 


**yo 


Ccll2  =  HxTnT"||2 


—  (^'j1_'jhX0  ^j’X-O, 

=  [<l> T_f , 

As  a  consequence  of  the  RIP, 


p 

xo 

~Xg  p 


> 


1 


1  —  2$2K  — 


252  K  —  5? 


2  K 


1  +  52  K 

1  -  5 52K  +  35fK 

1  —  $2K 


1-5 ■ 


2  K 


xo| 


xo| 


Based  on  this  inequality,  we  can  show  that 


l|X0,p||2  = 


< 


1 


1-5 


K 


-62K  ||*0 1| 2  ^ 


62K 
1  -6- 


|x0| 


2K 


This  proves  the  stated  claim. 
2)  Note  that 


llxoll2  =  V  l|xo||2  -  llx,c||2 
<  l|x0||2 


'1- 


1  -  5 62K  +  35.? 


2K 


1  &2K 


yr  =  resid  (y,  G  span  (<&T(J  , 

and  that  yr  is  orthogonal  to  i>  r.  We  therefore  have 

f  ($rufx--) 


$ 


T-TJ 


^TU  f^r 


$ 


To  make  this  result  more  tractable  for  subsequent  anal¬ 
ysis,  we  observe  that 

(1  -  622K)2 -(1-562k  +  3622K)2 
<  (l  —  $2k)  ~~  (l  —  5 62K  +  5|k) 


>  (1  -  52k)  ||Xr  ||  2 

>  (1  -52k)  (||x0||2  -  ||x0,p||2) 

>  (1  -  252jf)  || x0 1|  2  . 

Since  the  set  T"  is  chosen  so  as  to  maximize  the 
correlations  with  the  residual  vector,  we  can  show  that 


<  1052x  (1  —  62k)  1 


=  m2K  -  29 6\k  +  mlK 

2 


so  that 


'■oil 2  - 


a/10 &2K  n  ,  || 
<  - - ; -  llxol 


1  +  62K 


as  claimed. 


3>T'yr||2  > 


&*T_fyr 


>  (1  -  2 52k)  11*0 II 2  , 


which  completes  the  proof. 

3)  Using  the  decomposition 

y r=  [^T-T>^t]  [X0’— X0,p]  , 

we  can  show  that 

>T"yr||2  —  1 1  ^T— TXT— T  1 1 2  ||  ^TX0,p  ||  2 

1  5l,<  II** . 


< 

\\&*T, 

T> 

< 

\\®T, 

T> 

1-5 


2  K 


(ID 


Since  Tf]T"  =  </>,  we  can  partition  the  set  T  —  T  as 
T  -  T  =  (T  P |  T")  {j(T-T-T" 


Then 


' J  'll  p rpX.p p 1  ^ 


< 

< 


^ . j  |  |  rp  /  /  rp  rp^i.rp  pi 


**  .  * 

T—T—T "  ^  T—T^T—T 


& *n 


Tfl  T"^Tf]  T"xrfl  T" 


+ 


p  pj  pn^6p _ rp _ p//Xp _ rp _ p/j 


+  62K  11*0  ll2 

<  (1  +  52k)  ||xTp|  pn  ||2  +  S2K  11*0 II 2  +  ^2 K  11*0 1|2  • 


(12) 


Upon  substituting  Inequality  (12)  into  (11),  we  obtain 

2 62K  ~  5? 


E.  Proof  of  Theorem  4 

As  in  the  previous  subsection,  we  first  introduce  the  notation 
followed  in  this  part  of  the  manuscript: 

=  $T_T  >xt-T'  denotes  the  part  of  the  measurement 
vector  not  captured  by  T'\ 

=  Xp_pj  denotes  part  of  the  signal  x  not  captured 
by  T'\ 

=  proj  (yg,  denotes  the  projection  of  y'0  onto 
span  (4>T')i 

denotes  the  projection  coefficient  vector  correspond¬ 
ing  to  yo,p.  i-e-  yo.p  =  ^t'XqiP; 

=  proj  (y',  t&T’/)  denotes  the  projection  of  y  onto 
span  (4>T')i 

stands  for  the  projection  coefficient  vector  corre¬ 
sponding  to  y'„  i.e.,  yp  =  4>T'Xp; 

T  denotes  the  estimate  of  the  K  indices  in  T  upon 
completion  of  an  iteration  (i.e.,  the  set  of  those  I\ 
indices  that  are  deemed  sufficiently  reliable); 

AT  =  T'  —  T  consists  of  the  set  of  indices  estimated  to 
be  incorrect; 

Axo  =  xrn  a, 7-  denotes  the  signal  component  erro¬ 
neously  removed  from  the  list  at  the  given  iteration; 
x0  =  xT_rp,  denotes  the  signal  component  not  captured 
by  T. 

As  for  the  previous  proof,  the  sets  and  signal  coefficient  vec¬ 
tors  introduced  above  are  illustrated  in  Fig.  10.  The  previously 


yo 


xo 

yo,P 

X0,p 

y  p 

f 


x 


p 


Q*Tny  ||  ^  +  52k)  ||xTnT//||  |  ”2 k  ||£o||  studied  concept  of  the  smear  of  a  vector  is  also  depicted  in 

2  1  —  62K  the  same  figure. 
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'y  AT  f  AT 

V 

T 

* - Axo - * 

Tf|AT 

< -  xo - 

t-v  ; 

T-f  ! 

Figure  10.  Illustration  of  sets  and  signal  coefficient  vectors  for  Theorem  4 


To  prove  the  theorem,  we  again  proceed  with  establishing 
the  validity  of  four  different  claims,  listed  below. 

1)  It  can  be  shown  that 

$3K  „ 


< 


2)  For  any  index  i  g  T{JT", 


'■0112  ' 


rv/\  =  Jxi+  (xoJi  if 
[ph  (KJ,  ^T‘ 


3)  One  has 

4)  And,  finally. 


Axo||2  <  2  x0  p  2  . 


iixoii2  <  ? + t3A  iixoii2- 


i — <>3K 

Proof:  The  proofs  proceed  as  follows. 

1)  To  prove  the  first  claim,  we  only  need  to  note  that 


0,p||2 


< 


($5v$t0  ($T-T'xo) 

1  ~  S3k 


-fox  ||xoll2  < 


'■0112 


1  ~S2K  "  Ull“'  -1-63K 

2)  This  claim  is  proved  by  partitioning  the  entries  of  the 
sampling  matrix  as  follows.  First,  we  write 

|~)  T'  1  T'\  • 


Then,  we  observe  that 


3)  As  described  before,  if  T  C  T',  then  AT  P|  T  =  <f>  and 
||  Axo  || 2  =  0-  However,  if  T  —  T1  <f>,  the  projection 
coefficients  is  a  smeared  version  of  xT> .  By  the 
second  claim  of  this  proof,  the  smear  is  simply  Xq 
and  its  energy  equals  ||xqp||9. 

In  what  follows,  we  first  show  that  the  energy  of  the 
projection  vector  x'p  restricted  to  AT  is  smaller  than 
the  energy  of  the  smear,  i.e., 


(xp) 


PJ  AT 


< 

2 


X 


0  ,p 


2  ’ 


Consider  an  arbitrary  index  set  AT"  C  T'  of  cardinality 
K  that  is  disjoint  from  T,  AT"  fj  T  =  <j>.  Such  a  set 
AT"  exists  because  |T'  —  T|  >  K.  By  the  second  claim 
in  this  proof. 


(xp) 


PJ  AT'1 


=  JE  « 


ieAT" 


=  JE  Kp) ■  < IK 


p  M  2  ■ 


iGAT" 


Since  AT  is  chosen  to  contain  the  K  projection  coeffi¬ 
cients  with  the  smallest  magnitudes. 


(xp) 


PJ  AT 


< 


(xp) 


PJ  AT'1 


^  KpL- 


Next,  we  decompose  the  vector  (x'p)  AT  into  a  signal 
component  and  a  smear  component.  Then 


(xp) 


PJ  AT 


XAT  +  (Kp) 


>  ||xat||2  - 


PJ  AT 


0  ,p)  /\T 


We  also  have 


AxqIIo  =  ||xat||2  < 


(xp) 


PJ  AT 


(x0,p) 


PJ  AT 


<  2  1 1  Xn 


0,p|l2  ’ 


which  completes  this  part  of  the  proof. 

4)  This  claim  is  proved  by  combining  the  first  three  parts, 
and  it  results  in 


T'xTfl  T> 

=  [^Tf|  T’i  &T-T'] 


xTf|  T> 

0 


xTf|  T> 

0 


Consequently, 


4  = 


xTfl  T> 

0 


0,p 


xTfl  T> 

0 


0,p> 


which  establishes  the  stated  result. 


llxo||2  < 
< 

< 


II  Ax0 1|2  +  llxoll2 
2  1 1  xo,p  1 1 2  Hxoll2 


2S3k 
1  —  53k 
1  +  S3k 
1  —  53k 


'-0II2 


F.  Proof  of  Theorem  8 

Without  loss  of  generality,  assume  that 

\xi\  >  \x2\  >■■■>  \xk\  >  o. 

The  following  iterative  algorithm  is  employed  to  generate  a 
partition  of  the  support  set  T  that  will  establish  the  correctness 
of  the  claimed  result. 
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Algorithm  2  Partitioning  of  the  support  set  T 
Initialization: 

Let  Ti  =  {1},  i  =  1  and  j  =  1. 

Iteration: 

If  i  =  AT,  quit  the  iterations;  otherwise,  continue. 

If  „  „  1 

||x{i+l,-"  ,K}  1 1 2  —  2  ’ 

then  we  set  Tj  =  Tj  (J  {i  +  1};  otherwise,  we 
have  ^ 

||x{i+l,---  ,K}  || 2  ^  2  ’ 

and  we  set  j  =  j  +  1  and  Tj  =  {*  +  1}. 

Let  i  =  i  +  1.  Continue  with  a  new  iteration. 


Suppose  that  after  the  iterative  partition,  we  have 

T  =  Tl  (J  ?2  (J  ■  ■ '  U 

where  .7  <  K  is  the  number  of  the  subsets  in  the  partition. 
Let  Sj  =  \Tj\ ,  j  =  1,  •  •  •  ,  J.  It  is  clear  that 


j 

Y  ai  =  A" 

7=1 

Then  Theorem  8  is  proved  by  invoking  the  following  lemma. 
Lemma  6: 

1)  For  a  given  j,  let  \Tj\  =  s,  and  let 

Tj  =  {i,i  +  1,  •  ■  ■  ,  i  +  s  —  1}  . 

Then, 

\xi+s—i—k I  <  3fe  |xi+s_i| ,  for  all  0  <  k  <  s-1,  (13) 


and  therefore 


2)  Let 


\xi+s— 1|  ^  ,^s  x{i,-  ,K}  2' 


log  2  —  Sj  log  3 


(14) 


(15) 


log  CK 

where  [•]  denotes  the  ceiling  function.  Then  for  any 
1  <  jo  <  J,  after 

7o 

Yni 

7=1 

iterations,  the  SP  algorithm  has  the  property  that 

7o 

U  Tj  C  f .  (16) 

7=1 

More  specifically,  after 

-  ~ 

7=1 


1.5  •  A" 


log  CK 

iterations,  the  SP  algorithm  guarantees  that  T  C  T. 


(17) 


Proof:  Both  parts  of  this  lemma  are  proved  by  mathemat¬ 
ical  induction  as  follows. 


1)  By  the  construction  of  Tj, 

||X{i+S,-  ,K}  1 1  2  —  2  la'*+s_ll  ' 

On  the  other  hand, 

2  la'*+«— 2|  —  ||X{i+s-l,-"  ,k}  1 1 2 

^  llx(  i+s,”*  ,K}  ||  2  4“  \xi+s— 1 1 

3  ,  ~ 

—  2  la'*+s— 1|  • 

It  follows  that 

| — 2 1  3  |ah+s— 1|  • 

Now  suppose  that  for  any  1  <  fc0  <  s  -  1, 

|a;i+s_i_fc|  <  3fc  |arj+s_i|  for  all  1  <  k  <  k0  -  1. 


Then, 

2  \xi+s-l-k0  I  —  ||x{i+s  —  feo,--- ,K}  || 

—  fco  |  4*  *  *  *  T  |x-i+s— 1 | 

+  ||x{i+s,-,K}|| 

<  ^3feo  1  H - ^  +  2)  l:Ci+s-1l 

3fc° 

—  l^i+s-ll  • 

This  proves  Equation  (13)  of  the  lemma.  Inequality  (14) 
then  follows  from  the  observation  that 

||x{v--  ,K}  ||  2  —  l^i  I  + - F  l^i+s-ll  +  ||x{i+s,---  ,K}  ||  2 

<  ^3S  1  + - ^  +  2) 

3s 

—  l^i+s-1 1  ■ 

2)  From  (15),  it  is  clear  that  for  1  <  j  <  J, 

2 

cJ  <  — ■ 

A  _  3sj 

According  to  Theorem  2,  after  ni  iterations, 
ll*ol|2<^!lxl!2. 

On  the  other  hand,  for  any  i  e  Ti,  it  follows  the  first 
part  of  this  lemma  that 

W  >  \xSl  \  >  ^  Ilxl|  ■ 

Therefore, 

Ti  C  T. 

Now,  suppose  that  for  a  given  j0  <  J,  after  'YjjL-^  rij 
iterations,  we  have 

7o— 1 

U  T:>  C  f- 

7=1 

Let  T0  =  UjS1  Tj.  Then 

llxo||2  =  ||xT-t||2  —  llxT-Tol|2  • 
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Denote  the  smallest  coordinate  in  Tj0  by  i ,  and  the 
largest  coordinate  in  Tj0  by  k.  Then 

2  I,  ii  2 

\xk\  -  3^"  IIxt-t0||2- 

After  rij0  more  iterations,  we  obtain  T'  and  Xq.  Conse¬ 
quently, 

llXoll2  <  llxo||2  <  7^-  I|xt-T0||2  <  \xk\  ■ 

As  a  result,  we  conclude  that 

T  C  T 

±3o  -1 

after  rij  iterations,  which  proves  inequality  (16). 

Now  in  order  to  ensure  that  T  C  T,  the  SP  algorithm 
needs  at  most 


||x-x||2  <  ||xf -$ly||2 
=  1 1  Xj,  -  4>t  (3>TxT  +  e)| 


t-t||2 


^T-T  ||2 


<  xf  -  4>t  ($TxT)  +  3>te 


—  ||xf  ($Tnrxrnf 
+  1 1  <E*  1 1 


|  t  T—T  T—T | 
s/l  +  &K  |,  ||  ii 

(  &2K  ,  ,\ 


t-t||2 


/TTS^ 


H.  Proof  of  Inequality  (8) 

Following  the  same  notations  outlined  in  Section  D,  we  have 

ll*T'yr||2>  ^T^TUfXr||2-||^e||2 

—  *^T-f^T\J  fXr  2  —  \/l  +  8k  ||e||2 


\  ^  \  ^  Si  log  3  —  log  2  +  1 

^  -log  cK 

K  log  3  +  J  (1  —  log  2) 

-  log  CK 

IC  (log  3  +  1  -  log  2)  K  ■  1.5 
log  Ck  -  -log  CK 

iterations.  This  completes  the  proof  of  the  last  claim. 


G.  Proof  of  Lemma  3 

The  claim  in  the  lemma  is  established  through  the  following 
chain  of  inequalities. 


>  (1  -  2 82K)  ||x0||2  -  \/l  +  5k  ||e||2 

>  (1  -  2 S2K)  ||x0||2  -  ^1  +  ||ej|2  . 

On  the  other  hand, 

||$T'yr||2  <  ^T'^TUfXr  0+II^T'ell2 

<  (1  +  S2K)  HxTr|T'||,2  +  ^K_  ||x0||2 

+  +  2^2K^J  I e  1 1  2  • 

By  combining  these  two  bounds  we  obtain 


II  1  —  562k  +  3(5|k  2  +  (W  I,  I, 

l^nr-ll,  > - - IM*  -  »e«» 


1  —  5 62k  +  3<5fK  m  ~  ||  „  n  I, 

>  - 7— fa - l|x0||2  -  2  ||e|| 2 

1  °2I< 

Recall  next  the  result  in  Inequality  (7),  stating  that 

||e||2  <  6^K2  ||x0||2. 

1  °2K 

The  two  above  inequalities  imply  that 

H  H  1  -  7 S2k  +  8\K  „ 

llxrnr'||2  >  - 7— n - !Ixo||2  • 

±  (In  ts- 


Therefore, 


K||2  < 


\/(l  -  SIk)2  -  (1  -  7S2K  +  522Kf 
1  —  &2K 

\jlL82K  (l  -  (1  +  f)  $2 1<  +  6%k) 

1  -  8  2  K 

\A6(W  (1  -  2S2k  +  8'1K)  ||a  N 

1  _  A2  llxolla 

1  °2I< 

4V8?K  m /.  „ 


i»+(r77  +  1J»x^»  +  T3lfW2 
-1  -«52AJX^II2+  !-Sk  l|e|l2‘ 

Note  that  the  next  to  last  inequality  is  a  consequence  of  the 
fact  that 


claimed. 


(^Tf|  fXTf|T 


By  relaxing  the  upper  bound  in  terms  of  replacing  52k  by 
83K,  we  obtain 

l|x  -  ^  -  r^+7  n^ll2  +  7+7 l|e|'2' 

This  completes  the  proof  of  the  lemma. 


I.  Proof  of  Inequality  (9) 

We  start  by  first  upper  bounding  the  norm  1 1  Xq  p  1 1 0  as 
outlined  below. 


($T-f  u  T'xo  +  e)  2 

-  yjT'^T-T (JT'X0  2+  $f|JT/e  2 

,  83K  ||  ,  ||  ,  a/1  +  83K  ||  || 

-r^wl|Xoll2+T^riie|i2 

,  83K  ||  / 1|  .  1  +  28sk  I,  I, 
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Then,  similar  type  of  arguments  as  those  used  in  Section  E, 
establish  that 


llxo 


l2<2 


0,p||2  ’ 


'■0112 


< 

< 


1  +  &3K 
1  —  $3K 

4v^ 

1  —  S3  K 


Klla 

llxo||2 


2  +  S3K 

1  —  S3K 

2  +  S3k 
1  —  S3K 


l|e||2 

H2. 


Recalling  the  assumption  in  (7),  we  arrive  at 


11-  11  -  4\/S3k  I,-  I,  2  +  S3k  S3k 

llxo||2  <  t - j—  llxo||2  ^ - 


1  —  S3K 

,  4 \/%K  ||  -  | 
<  - - —  llxo| 


1  —  S3K  1  —  siK 
2(W 


lxo| 


1  —  S3K 

thereby  proving  the  claimed  result. 


2  t  77. - 7 - 72  Hxoll2  > 

(1  -  03 K) 


2 
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